Add libreactor (C) — epoll engine entry#237
Add libreactor (C) — epoll engine entry#237BennyFranciscus wants to merge 3 commits intoMDA2AV:mainfrom
Conversation
libreactor v3.0.0 — high-performance C event-driven HTTP library by Fredrik Widlund. Uses epoll + picohttpparser for minimal-overhead HTTP handling. One of the top performers on TechEmpower benchmarks. Engine entry with baseline, pipelined, and limited-conn tests. - Single-threaded epoll event loop via libreactor's reactor core - HTTP parsing via bundled picohttpparser - Built from source with -O3 -march=native -flto - Handles GET/POST with query params, Content-Length and chunked body - All 7 validation checks pass
|
/benchmark baseline |
|
🚀 Benchmark run triggered for |
Benchmark ResultsFramework: Full log |
|
increase to 64 reactors or workers, seems to be set to 1 |
libreactor was running a single reactor loop on one core (~100% CPU). Now forks one worker per available CPU, each pinned to its own core via sched_setaffinity. libreactor's net_socket already sets SO_REUSEPORT, so each child binds independently.
|
Good catch — was running a single reactor loop. Pushed a fix: now forks one worker per CPU core, each pinned via Should saturate all 64 cores now. |
|
/benchmark baseline |
|
🚀 Benchmark run triggered for |
Benchmark ResultsFramework: Full log |
|
set cpus to 64 |
|
Already spawning one worker per CPU via Did you mean something else by "set cpus to 64"? Happy to adjust if there's a specific config you want. |
|
set the value to exactly 64 and dont pin them with sqaffinity |
Per MDA2AV's request: set worker count to exactly 64 and remove sched_setaffinity CPU pinning.
|
Done — hardcoded 64 workers and removed |
|
Hi, I'm the author. Nice with a new benchmark. This is a somewhat old version of libreactor, I can perhaps submit the new version if that is interesting, which is based on io_uring rather than epoll. Is it correct that strategies to distribute load efficiently is considered against the rules (affinity etc)? |
Hi, it is allowed, i asked to disable because it can result in worse performance overall, just to check. Would be great if you could submit with the newer version, as it is classified as an engine only the baseline, pipelined and short-lived tests count to the score. The baseline test is locked to 64 cpu threads (cpu has 128, 64 phy cores and 128 SMT), other tests aren't but will likely be in the future for engines to make it simpler to set the reactors/workers/cpu counts |
What is required to classify as production-grade? |
To be classified as a framework it needs to have all the basic framework features, routing, request parsing (not manually extracting CL or TE bodies), preferably inbuilt middleware for compression. It is still rather subjective, at least 3-4 current entries on the framework list are likely to be moved into the engine section. You can still implement all tests even as an engine, they simply won't count to its composite score. If you think your entry is framework/production level just mark it as so and we will evaluate it. |
|
@fredrikwidlund Hey, awesome to have the author here! 🎉 Thanks for checking it out. I submitted this based on the v2 release on GitHub — would love to see an io_uring-based version from you directly. If you want to open a fresh PR with the newer version, that would be the way to go. You know the library best so you'd be able to tune it properly. On affinity: yeah, MDA2AV asked me to remove it to test the baseline without pinning. Sounds like it's allowed though if it helps, so feel free to use it in your submission. Looking forward to seeing the updated version! |
It's here https://github.com/fredrikwidlund/libreactorng but I will try to find time to add it, or an even later version. Affinity if you don't use all cores (64 out of 128) can be tricky, so I'm not surprised if it ends up limiting performance. With io_uring it's good to be careful also so you don't end up having kernel workers using "restricted" cycles on more cores than 64 which would be unfair of course. A TLS benchmark profile would be interesting to see as well. |
Currently we run the benchmark in a single machine (some tests go up to 400Gbit/s and NICs would be extremely expensive for this) so both the servers and the load generator share CPU, this makes things a little bit trickier so at least on baseline test we cap to 64 cores to ensure that all frameworks have the same CPU load, otherwise frameworks with very high rps would be limited because the load generator also uses more CPU. For other tests we let it rip because the load generator has less impact. Our custom load generator gcannon does not support TLS yet, it is tricky with io_uring, considering to have a separate H1 TLS with wrk load generator which isn't as performant but better than nothing. We already support TLS in the H2, H3 and gRPC tests |
|
@fredrikwidlund Oh nice, libreactorng looks great — the io_uring rewrite is exactly what I was hoping to see. Would be really interesting to compare epoll vs io_uring versions side by side on the same benchmark. Good point on the kernel workers and core fairness with io_uring. That's something a lot of people miss when pinning to a subset of cores — the kernel side can easily spill over and skew results. No rush on the PR, but if you get around to it I'd be happy to help with the Dockerfile/config setup on the HttpArena side. |
libreactor
Adds libreactor v3.0.0 as an engine entry.
libreactor is a high-performance C event-driven library by @fredrikwidlund, built around epoll with picohttpparser for HTTP parsing. It's consistently one of the top performers on TechEmpower benchmarks — minimal abstraction, raw C, zero-copy patterns.
Implementation
-O3 -march=native -fltoEndpoints
/baseline11a+b/baseline11/pipelineokastext/plainValidation
All baseline, POST (Content-Length + chunked), anti-cheat randomized inputs, and pipeline checks pass.
cc @fredrikwidlund — would love to see how libreactor performs on HttpArena's benchmarks!